ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun
نویسندگان
چکیده
We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences.
منابع مشابه
Comparing the whole-genome-shotgun and map-based sequences of the rice genome.
The rice genome has now been sequenced using whole-genome-shotgun and map-based methods. The relative merits of the two methods are the subject of debate, as they were in the human genome project. In this Opinion article, we will show that the serious discrepancies between the resultant sequences are mostly found in the large transposable elements such as copia and gypsy that populate the inter...
متن کاملUnder-representation of repetitive sequences in whole-genome shotgun sequence databases: an illustration using a recently acquired transposable element.
It is widely accepted in a conceptual framework that repetitive sequences, especially those with high sequence homogeneity among copies, tend to be under-represented in whole-genome shotgun sequence databases, because of the difficulty of assembling sequence reads into contigs. Although this is easily inferred, there is no quantitative illustration of this phenomenon. An example using a current...
متن کاملTERMINUS - Telomeric End-Read Mining IN Unassembled Sequences
UNLABELLED TERMINUS is a set of tools to map telomeres on draft sequences of whole genome shotgun sequencing projects. It mines raw sequence reads (from a trace archive) for telomeric reads, assembles them into contigs representing individual chromosome ends and BLASTs the resulting consensus sequences against the genome assembly to identify telomere-proximal genomic contigs. Finally, it estima...
متن کاملHigh‐throughput sequencing and graph‐based cluster analysis facilitate microsatellite development from a highly complex genome
Despite recent advances in high-throughput sequencing, difficulties are often encountered when developing microsatellites for species with large and complex genomes. This probably reflects the close association in many species of microsatellites with cryptic repetitive elements. We therefore developed a novel approach for isolating polymorphic microsatellites from the club-legged grasshopper (G...
متن کاملDifferential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium.
The DNA content of eukaryotic nuclei (C-value) varies approximately 200,000-fold, but there is only a approximately 20-fold variation in the number of protein-coding genes. Hence, most C-value variation is ascribed to the repetitive fraction, although little is known about the evolutionary dynamics of the specific components that lead to genome size variation. To understand the modes and mechan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PLoS Computational Biology
دوره 1 شماره
صفحات -
تاریخ انتشار 2005